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Abstract 



Numerical methods based on interval arithmetic are efficient means to reliably solve nonlinear systems of equa- 
tions. Algorithm bc3revise is an interval method that tightens variables' domains by enforcing a property called 
box consistency. It has been successfully used on difficult problems whose solving eluded traditional numerical 
methods. We present a new algorithm to enforce box consistency that is simpler than bc3revise, faster, and eas- 
ily data parallelizable. A parallel implementation with Intel SSE2 SIMD instructions shows that an increase in 
performance of up to an order of magnitude and more is achievable. 

CZ2 ! 
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1 Introduction 

oo 

Interval methods [12] are numerical algorithms that use interval arithmetic [11] to avoid rounding error problems 
. intrinsic to floating-point arithmetic [7]. They give enclosures of all solutions of nonlinear systems of equations 

' with the guarantee that no solution is ever lost. 

\Q . Straight interval extensions of classical numerical algorithms such as the Newton method are not well-suited 

to problems with many solutions or with large initial domains for the variables. To tackle these shortcomings, 
elaborate algorithms have been devised in the context of Interval Constraint Programming [1]; they are usually 
employed as the inner stage of a free-steering nonlinear Gauss-seidel method to exclude parts of a variable's 
domain that do not contain zeroes of a unidimensional equation. Domain tightening is achieved by enforcing some 
local consistency property. Box consistency [2] is one such consistency notion, which has been proved efficient in 
handling hard problems whose solving eluded traditional numerical methods for years [6]. It is usually enforced 
by Algorithm bc3revise [2], which combines a binary search with interval Newton steps [11] to isolate leftmost 
and rightmost zeroes of a unidimensional equation in the domain of a variable. 

Thanks to ubiquitous Intel SSE2 SIMD instructions, it is possible to perform many interval operations at 
roughly the same cost as floating-point operations by computing the two bounds of the result in parallel (basic 
interval vectorization) [5]. We outline in Section 2 a novel way to do even better and to compute an interval 
function for two different intervals in parallel (a four times speed-up compared to "sequential" interval evaluation). 

As all interval methods, Algorithm bc3revise — described in Section 3 — can benefit from basic interval vector- 
ization without any modification. On the other hand, it cannot take full advantage of the new arithmetic described 
in Section 2. Hence the introduction of Algorithm sbc in Section 4. 1 : it is a new algorithm that enforces box con- 
sistency by "shaving" domains from the left and right bounds inward. Experiments show that a sequential version 
of sbc is already faster than bc3revise on a set of test problems. We then describe in Section 4.2 an algorithm 
that exploits the potential for a high level of data parallelism in sbc by using SSE2 instructions to perform inter- 
val arithmetic evaluations of functions at four times the speed of a sequential code. Experiments are reported in 
Section 5 and show increases in performances over bc3revise of up to an order of magnitude and more. 
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2 Interval Arithmetic and its Vectorization 



Classical iterative numerical methods suffer from defects such as loss of solutions, absence of convergence, and 
convergence to unwanted attractors due to the use of floating-point numbers (aka floats). At the end of the fifties, 
Moore [11] popularized the use of intervals to control the errors made while computing with floats. Additionnally, 
interval extensions of iterative numerical methods are always convergent. 

In the following, we use the notations sponsored by Kearfott and others [9], where interval quantities are 
boldfaced. 

Interval arithmetic replaces floating-point numbers by closed connected sets of the form J = [J, I] = {a E 
K I L ^ a ^ 1} from the set I of intervals, where 7 and J are floating-point numbers. In addition, each n-ary real 
function <p with domain C M™ is extended to an interval function $ with domain 2?$ C I™ in such a way that 
the containment principle is verified: 



\/A e Vi„\/I e P*: A e I 4>{A) e 
as illustrated by the following example. 

Example 1 The natural interval extensions of addition and multiplication are defined by: 

h + h = [|ii + i2l,TJT + i2T] 



(i) 



h x I 2 = [min(| hhL Ihhl, Ihhl, I hhl), max(tfi/ 2 T, Uihl, Uiht T hhl)] 

where j r J. ( resp., ]V\) is the greatest floating-point number smaller or equal ( resp., the smallest floating-point 
number greater or equal) to r. 

Then, given the real function f(x, y) = x X x + y, we may define its natural interval extension by f(x, y) = 
x x x + y, and we have that, e.g., f([2, 3], [-1, 5]) = [3, 14]. 

Implementations of interval arithmetic use outward rounding to enlarge the domains computed so as not to violate 
the containment principle (1), should some bounds be unrepresentable with floating-point numbers. 

Interval addition, subtraction, multiplication, division, and integral exponentiation may be computed at roughly 
the same speed as their floating-point counterpart thanks to SIMD instructions, and in particular, to Intel SSE2 
instructions. 

Intel SSE2 instructions manipulate 128 bits registers that may be interpreted in various ways. Most notably, 
the registers may pack 2 double precision or 4 single precision floating-point numbers. An SSE2 operator may 
then compute 2 or 4 floating-point operations in parallel (see Figure 1(a)). 
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(a) SIMD floating-point arithmetic 



(Least significant byte to the right} 

(b) Two interval additions 
with one SSE2 instruction 



Figure 1: Floating-point arithmetic and interval arithmetic in SSE2 registers 



The direction of rounding for SSE2 instructions is selected independently of that of the Floating-Point Unit 
(FPU). An SSE2 instruction uses the same rounding for all operations performed in parallel. Nevertheless, thanks 
to simple floating-point properties, it is still possible to write algorithms that compute in parallel the two outward- 
rounded bounds of the result of interval operations. For example, we may use the property: 

J.a + H= - T-a-H 

where a and b are floating-point numbers. 

By storing the negation of the left bound of an interval, and by setting once and for all the rounding direction 
for SSE2 instructions to +oo, the two interval additions [a, b] + [e, f] and [c, d] + [g, h] can be performed by the sole 
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SIMD instruction depicted in Figure 1(b). All the other operators may be defined accordingly. Goualard's paper [5] 
illustrates these principles for the case of basic interval vectorization (two double precision bounds computed in 
parallel). The algorithms to compute four bounds in parallel are new and are reported in an unpublished paper 
currently under review. 

Armed with an interval library whose operators compute two interval operations in parallel, we may evaluate 
the interval extension of a function / for two different intervals for the same cost as one floating-point evaluation 
of /. In the following, we note lf(Ii), /(-Z2)] such a parallel evaluation of / for two different interval arguments. 

3 Box Consistency and the bc3revise Algorithm 

Interval Constraint Programming [1] is a successful approach to reliably isolate all solutions of systems of equa- 
tions. It makes cooperate contracting operators to prune the domains of the variables (intervals with floating-point 
bounds from the set I) with smart propagation algorithms [10] — akin to free-steering nonlinear Gauss-Seidel — to 
ensure consistency among all the constraints. 

The amount of pruning obtained from one equation is controlled by the level of consistency enforced. Box 
consistency [2] is defined as follows: 

Definition 1 (Box consistency) An equation f(x%, . . . , x n ) — is box consistent with respect to a variable Xi 
and a box B — I 1 x • ■ ■ x I n if and only if: 



where I = [J, I] is an interval with floating-point bounds, a + (resp., a~ ) is the smallest floating-point number 
greater than (resp., the largest floating-point number smaller than) the floating-point number a, and f is the 
natural interval extension of f. 

Given a real function / : M. n —> M, and a box of domains B = J x x • • ■ x I n g I™, we define gf : I — > I as 
the ith unary interval projection with respect to B of its interval extension /: 



In the following, we will mostly manipulate gf instead of /. The original real function /, the box B of domains 
considered and/or the variable Xi on which the projection is performed will often be left implicit and omitted from 
the notation of g. 

In order not to lose any potential solution, an algorithm that enforces box consistency of an equation with 
respect to a variable x\ and a box of domains must return the largest domain J? C 1^ that verifies Eq. (2). 



Algorithm 1 Computing a box consistent interval with respect to g the usual way 
[bc3revise] in: g : I — * I; in: I e I 

# Returns the largest interval included in J that is box consistent with respect to g 
begin 

1 Ii <— left_narrow(<7, I) 

2 if h ^ : 

3 I r <- right_narrow(g, [//_,/]) 

4 return □ (1/ u I r ) # returns the smallest interval w.r.t. set inclusion that contains h U I r 

5 else: 

6 return 

end 



Algorithm bc3revise [2], presented by Algorithms 1 and 2, considers the unary projection of an n-ary equation 
on a variable Xi (where all variables but x.- L have been replaced by their current domain) and a domain Ii. It enforces 
box consistency by searching the leftmost and rightmost canonical domains* for which g evaluates to an interval 
containing 0. The search is performed by a dichotomic search aided with Newton steps to accelerate the process. 




(2) 



gf{x) =/(/!,..., 



Ii—\,X, /i+i 



, • ■ -,I n ), i e {1, . . .n). 
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Algorithm 2 Computing a box consistent left bound with Newton steps and a binary search 

[left_narrow] in: g: I — ► I; in: I e I 

# Returns an interval included in / 

# with the smallest left bound I such that e g([l, 
begin 

1 if ^ g(I) : # No solution in / 

2 return 

3 else: 

4 if/ + ^7: #canonical(7): 

5 return I 

6 else: 

7 J<— Newton(g,g',I) # Interval Newton steps 
s if e g([7, 1 + ]): # Box consistent left bound? 

9 return I 

10 else: 

11 (Ji.Ja) «-split(J)# /i <- [I,m(J)],J 2 <- [m(7),7] 

12 J <— left_narrow(g, 

13 if 7 = 0: 

14 return left_narrow(g, J 2 ) 
is else: 

i6 return I 

end 



Algorithm 2 describes the search of a quasi-zero to update the left side of 7j. The procedure righLnarrow to 
update the right bound works along the same lines and is, therefore, omitted. 

Algorithm bc3revise first tries to move the left bound of Ii to the right, and then proceeds to move its right 
bound to the left. The Newton procedure computes a fixpoint of the Interval Newton algorithm [11], where a 
Newton step at iteration j + 1 is: 

with m(J) the midpoint of the interval I' , As in Ratz's work [13], the Newton step uses an extended version of the 
interval division to return a union of two semi-open-ended intervals whenever g' (jw) ) contains 0. The subtraction 
and the intersection operators are modified accordingly. The intersection operator is applied to an interval (/^') 
and a union of two intervals (result of the subtraction), and returns an interval. Figure 2 presents graphically the 
steps performed to enforce box consistency. The encircled numbers label the steps. 

Algorithms bc3revise, lefLnarrow and righLnarrow do not offer opportunities to exploit full data parallelism 
as they do not require close evaluations of the same function over different domains. The same holds for the 
Newton procedure: in the general case, g and g' are different functions, and therefore cannot be evaluated in 
parallel with SIMD instructions. 

4 Box Consistency by Shaving 

To obtain a higher level of data parallelism, we propose a new algorithm to enforce box consistency on the projec- 
tion gf of an equation / = on a variable xf starting from the original domain Ii for xi, consider separately its 
left half and its right half; for the left part, linearize g at Ii as for a Newton step, solve the resulting linear equation 
and intersect the resulting domain with the left half of Jj; do the same for the right half by linearizing g at Jj. A 
new smaller domain that preserves solutions is then obtained; Iterate until reaching a fixpoint. 

*A non-empty interval [a, 6] is canonical if a+ Js b. 

^Note that we are free to choose any point in 7, not the midpoint only. We take advantage of this in the algorithms presented in the next 
section. 
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Figure 2: Enforcing box consistency with bc3revise 



4.1 A Sequential Algorithm: sbc 

Figure 3 illustrates graphically the process just described, and Algorithm 3 presents the actual algorithm. 

Proposition 1 (Termination, Correctness, and Completeness of Sbc) Givenan n-ary equation c: f{x\, . . . , x n ) — 
0, a box ii x • • ■ x I n of domains, and a projection c: gi(x) = of c, we have: 

Termination. The call to sbc(gi, Ii) always terminates; 

Correctness. The equation c is box consistent with respect to Xi and sbc{gi, If); 
Completeness. No solution is lost during the tightening process: 

V(n, . . . ,r n ) £ h x • ■ ■ x I n : f(n, ...,r n ) = =>■ n 6 Sbc(gi,li). 

Proof. In the following, the interval I corresponds to the domain Ii ofxi. 

(Termination). Algorithm sbc terminates in any case since we always tighten I in the loop 2-23 (either by splitting it on 
Line 3, or by tightening Ii and I r with Newton steps on Lines 10 and 19). Since we consider intervals with floating-point 
bounds, of which there are finitely many, we have to reach canonicity of I eventually. At that point, either gi(I) contains 0, 
and we have reached box consistency, or it does not, and we can safely narrow I to 0, which both make us leave the loop. 

( Correctness). We leave the loop 2-23 if I is empty or if the canonical intervals at left and right bounds contain solutions 
of gi (Lines 5 and 14). In the latter case, we have the two conditions of Eq (2) for box consistency; the former case occurs if 
both Ii and I r do not contain a solution of c (Lines 7 and 16) or if the Newton steps on Lines 10 and 19 narrow them down to 
0. By correctness of the interval Newton method, this case only occurs if, once again, Ii and I r do not contain a solution of c. 

( Completeness ). By completeness of the Newton operator, as tightening only occurs either through Newton steps, or by 
discarding intervals that have been proved on Lines 5, 7, 14, or 16 not to contain solutions. □ 

For each iteration of the loop 2-23 in Algorithm sbc, we have to compute g for intervals [Jj, Ii + ], [I r , Ir], h, 
I r , \L, Ii], and [I r , I r ]. We also have to compute g' for intervals Ii and I r . All these evaluations are candidates 
for parallelization with SIMD instructions, as presented in the next section. 
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Figure 3: Domain reduction obtained with one iteration of the loop in sbc 



4.2 An SIMD Algorithm for Box Consistency: vsbc 

Algorithm 4 presents a modification of Algorithm sbc to make good use of its higher level of data parallelism 
thanks to the SIMD interval arithmetic that has been presented in Section 2. The evaluations of g and g' are 
reordered to appear in pairs that can be evaluated in parallel. In addition, we reuse the evaluation of g{[Ii , Ij ]) 

and g([I r ,I r ]) of Line 4 for the Newton steps of Line 22 instead of g([Ii, iil) and g([I r ,I r ]) as was done in 
Algorithm sbc. This choice avoids two evaluations of g at the cost of potentially slightly decreasing the tightening 
ability of the Newton step. The domain computed is unaffected by this optimization. In particular, box consistency 
is still obtained eventually. 

At each iteration of the loop between Lines 2 and 24, we perform 4 interval evaluations of g and 2 interval 
evaluations of g' for the same cost as 2 floating-point evaluations of / and 1 floating-point evaluation of /'. 

5 Experiments 

To evaluate the impact of our new algorithms, we have selected 20 instances of 12 classical test problems. Some 
are polynomial and others are not. The characteristics of these test problems are summarized in Table 1. All 
problems are structurally well constrained, with as many equations as variables. Column "Size" reports the num- 
ber of equations/variables. Column "Equations" indicates whether all equations are polynomial (quadratic, if no 
polynomial has a degree greater than 2). A problem is labelled "non-polynomial" if at least one constraint contains 
a trigonometric, hyperbolic or otherwise transcendental operator. All test problems are presented on the COPRIN 
web page [8]. 

All experiments were conducted on an Intel Core2 Duo T5600 1.83GHz. The Whetstone test [3] for this 
machine reports 1111 MIPS with a loop count equal to 100, 000. All algorithms have been implemented in an 
in-house C++ solver, with gaol [4] as the underlying interval arithmetic library. The SIMD interval arithmetic 
presented above has been implemented from scratch using Intel intrinsic instructions. In its current state, the 
library only contains vectorized versions of the addition, subtraction, multiplication, division, and integral power. 
All other SIMD functions are emulated with sequential interval arithmetic. As a consequence, only polynomial 
equation systems are entirely solved in an SIMD environment at present. 

Table 2 reports the time spent in seconds to isolate all solutions of the test problems in domains with a width 
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Algorithm 3 Enforcing box consistency by shaving 
[sbc] in: g : I -> I; in: I e I 

# Returns the largest interval included in I that is box consistent w.r.t. g 
begin 

1 (left -consistent, right -consistent) <— (false, false) 

2 do: 



3 (Ii,I P )«-split(I) 

4 if -left. consistent; # Updating the left bound 

5 if i g([Ii, # I not box consistent to the left? 

6 I t «- [7/_ + ,7)] # Considering the remainder of 

7 if ^ # No solution in the remainder of Ip. 

8 7j <- 

9 else: 

10 7* ^ J, n (7, - g(\Ii,Ii})/g'{Ii)) # One Newton step 

11 else: 

12 left consistent *— true 

13 if -aright .consistent. # Updating the right bound 

14 if0^g([77 ,7r]): # I not box consistent to the right? 
is J r <— |7r,^ ] # Considering the remainder of I r 

16 if ^ g(-Zr): # No solution in the remainder of I r l 

17 I r <- 

18 else: 

19 7 r <- J r n (77 - g([7r,77|)/g'(I r )) # One Newton step 

20 else: 

21 right .consistent <— true 

22 J <— □ (Jj u J r ) # Returns the "hull" [min^T^max^, I r )] of the union 



23 while (J 0) A (-^left .consistent V aright .consistent) 

24 return J 

end 



smaller than 10~ 6 , starting from the standard domains given on the COPRIN web page. An entry "TO" indicates a 
time-out (more than 30 minutes, here). Column bc3revise presents the results obtained with Algorithm bc3revise 
implemented with double precision interval arithmetic on the FPU; Column bc3vd corresponds to Algorithm 
bc3revise where interval arithmetic is performed in double precision with SSE2 instructions (basic vectorization); 
Column bc3vf corresponds to Algorithm bc3revise where interval arithmetic is performed in single precision with 
SSE2 instructions (we still perform only one interval operation per SSE2 instruction, using only the lower half of 
SSE2 registers, though); Column sbc corresponds to Algorithm sbc implemented with double precision interval 
arithmetic on the FPU; Column sbcvd corresponds to Algorithm sbc where interval arithmetic is performed in 
double precision with SSE2 instructions; Column vsbc corresponds to Algorithm vsbc where interval arithmetic 
is performed in single precision with SSE2 instructions (two interval operations are performed in parallel); lastly, 
Column vsbc&sbcvd corresponds to the cooperation of vsbc and sbcvd: vsbc is used until the domains are all 
smaller than a size fixed empirically to 0.25; sbcvd is used afterwards. 

6 Discussion 

As can be seen from Column sbc of Table 2, enforcing box consistency by shaving is faster than with Algo- 
rithm bc3revise on all problems of our test set, the ratio bc3revise/sbc ranging from 1.9 to 17.9 and beyond. 
We also believe that sbc is simpler to understand and easier to implement correctly than bc3revise. 
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Algorithm 4 A data parallel algorithm for box consistency enforcement 



[vsbc] in: g. I —> I; in: J e I 

# Returns the largest interval included in 7 that is box consistent w.r.t. g 
begin 

1 (left .consistent, right .consistent) <— (false, false) 

2 do: 

3 (ii,r P )«-spiit(i) 

4 {J h J r )^lg(\Il,lL + ]),9(&~X})l 

5 if ^ J*: # 7 not box consistent to the left? 

6 7j <— [-^ + ,I7] # Considering the remainder of h 

7 else: 

8 left. consistent <— true 

9 if ^ J r : # 7 not box consistent to the right? 

10 7 r «— [1^,77 ] # Considering the remainder of 7 r 

11 else: 

12 right .consistent «— true 

13 if —left. consistent V -aright .consistent: 

14 (ld,Jif P )4-fe(J,) )ff (I r )] 

is if ^ ifj : # First checking an obvious absence of solution in I t 

16 7j <— 

17 if ^ 7<" r : # First checking an obvious absence of solution in 7 r 

18 I r <— 

19 # Performing 2 Newton steps in parallel to update both bounds 

20 # For better performances, we reuse J L and J r 

21 # instead of g fl7j_, 7J) and g( [17, 17]) 

22 (J,,J P ) <- [I, n ([7 1 ,7 i +] - Jt/g'ilt)),^ n (pT,£] - J P /g'(7 P ))] 

23 7 <— □ (Jj u 7 P ) # Returns the "hull" [min(7j_,7 i .),rnax(77,7r)] of the union 

24 while (7 ^ 0) A (-'left. consistent V -aright consistent) 

25 return 7 

end 



Table 1 : Test problems characteristics 



Name 


Code 


Size 


Equations 


Bronstein 


bro 


3 


quadratic 


Broyden-banded 


bb 


100, 500, and 1 000 


quadratic 


Broyden tridiagonal 


bt 


10 and 20 


quadratic 


Combustion 


comb 


10 


polynomial 


Discrete Boundary Value Function 


dbvf 


10 and 30 


polynomial 


Extended Freudenstein 


ef 


30 and 50 


polynomial 


Mixed Algebraic Trigonometric 


mat 


3 


non-polynomial 


More-Cosnard 


mc 


50 and 100 


polynomial 


Robot 


rob 


8 


quadratic 


Trigexp 3 


te3 


5 000 


non-polynomial 


Troesch 


tro 


50, 100, and 200 


non-polynomial 


Yamamura 


yam 


6 and 8 


polynomial 
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Table 2: Experiments 



Problem 


bc3revise 


bc3vd 


bc3vf 


sbc 


sbcvd 


vsbc 


vsbcvd&sbcvd 


bro 


2.6 


0.9 


0.4 


0.3 


0.3 


0.06 


0.2 


bb 100 


10.4 


3.3 


3.2 


2.7 


1.2 


0.7 


0.8 


hh 500 

UU -J \J\J 


123.7 


39.1 


27.0 


24.8 


10.8 


5.3 


5.5 


hh 1 000 

L/C -l \J\J\J 


280.2 


88.7 


56.8 


55.1 


24.0 


11.1 


11.4 


ht 1 

Ul 1 V/ 


1 6 Q 


5 8 


3 4 


2. 1 


1 


0.4 


5 


for 20 


1127.4 


382.2 


260.3 


141.1 


66.7 


28.6 


30.4 


C- 1/ / l IU 


1.4 


0.6 


0.5 


0.2 


0.08 


0.03 


0.08 


dhvf 1 

Lit/ I'/ \-\J 


1.7 


0.5 


0.6 


0.3 


0.1 


0.08 


0.1 


dhvf 30 

Lit/ I'/ ~J \J 


42.7 


13.4 


20.4 


7.7 


3.2 


4.7 


3.8 V 


pf 30 
<y JKJ 


2.1 


0.8 


TO 


1.1 


0.5 


TO 


0.3 


pf SO 


5.1 


1.8 


TO 


2.2 


0.9 


TO 


0.6 


lYlClt 


?6 9 


1 3 6 


1 3 ? 

1J.Z. 




1.1 


0.4 


1 


tyic 50 


23.7 


6.5 


5.7 


11.0 


4.4 


3.0 


4.1 


mc 100 


175.5 


49.0 


46.2 


86.9 


36.8 


28.0 


35.6 


rob 


4.5 


1.5 


4.3 


0.8 


0.4 


1.0 


0.3 


te3 5000 


7.5 


5.1 


17.1 


2.9 


2.7 


3.3 


3.8 V 


fro 50 


24.8 


15.3 


29.1 


4.4 


3.6 


2.6 


3.4 


fro 100 


180.1 


112.7 


384.2 


30.9 


25.4 


33.7 


24.2 


fro 200 


1341.4 


844.4 


TO 


231.1 


188.1 


TO 


181 


yam 6 


14.6 


4.3 


4.2 


7.3 


2.5 


1.2 


1.7 


yam 8 


279.1 


84.6 


104.0 


91.0 


35.1 


26.1 


27.7 



Times in seconds on an Intel Core2 Duo T5600 1.83GHz (whetstone 100 000=1111 MIPS) 
Best times in bold blue. Time out JO set to 1 800 s. 

Numbers followed by the symbol "V" correspond to cases for which vsbcvd&sbc performs worse than sbcvd alone. 



Basic vectorization of interval arithmetic improves speed by up to three times (see bc3revise vs. bc3vd and 
sbc vs. sbcvd) at no cost since algorithms do not have to be modified in order to benefit from it. 

If we take advantage of the data parallelism inherent to Algorithm sbc to vectorize interval evaluations, leading 
to Algorithm vsbc, we obtain even better results on all problems but dhvf '30, ef 30, ef 50, rob, te3 5000, tro 100, 
and fro 200. All other things being equal, if we emulate SIMD instructions with double precision floating-point 
operations, we obtain back the times of sbc, which means that the very size of single precision floating-point 
numbers is the culprit here: as we vectorize 2 interval instructions with SSE2 registers, we must switch from 
double precision floats used in the rest of the program to single precision floats (see Figure 1(b)). The cast leads to 
less precision in the computation, which in turn has an impact on the ability to reject domains having no solutions. 
The same problem occurs for bc3vf. 




2**+(10/9}-sinh(10-x)-v 

S'y+(10/g)'Stnh{10'y)-x-1 

(a) Extended Freudenstein 2 (b) Troesch 2 

Figure 4: Slow convergence with single precision floating-point numbers 

As a consequence, the exploration algorithm has more branching to perform to isolate solutions. This incurs 
an increase in the running time that may be drastic for ill-conditioned problems such as Troesch or Extended 
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Freudenstein: as we may see in Figure 4 for the case of 2 equations and 2 variables, the curves for these two 
problems are almost tangent to each other and to the rry-plane on a large surface. Each equation considered 
separately leads to the computation of many quasi-zeroes that cannot be removed easily by the other equation of 
the problem. 

There is currently no easy cure to this problem as microprocessor makers do not seem to be ready to offer 
SIMD instructions on 4 double precision floats any time soon. It is still possible to quickly isolate regions of 
interest in "large" domains using vsbc, and then switch to sbcvd to polish the results and obtain tighter domains 
if necessary. Column vsbc&sbcvd in Table 2 shows that this procedure indeed removes the time-out problems of 
vsbc on ill-conditioned problems, while still preserving better performances compared to sbcvd alone. The best 
cooperation scheme that maximizes performances as much as possible still remains to be found, though. 
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